Baseball was the first major sport to embrace analytics. The game lends itself to recording data. The game has a small number of possible states, possible outcomes at any given moment.

One could do a term project on the development of analytics in sports, from Bill James, to Moneyball, to the present statcast era.

Rules Overview

For those unfamiliar with the game.

  • Nine players per side (with substitutions available)
  • Nine innings per game. Each inning has two halves.
  • In each half inning one team bats, the other plays the field.
  • The basic unit of action in baseball is the pitch:
    • the pitcher pitches a ball over home plate to be received by the catcher.
    • the batter tries to hit the pitch into play.
    • If the ball is hit into play, the batter tries to advance to 1st base (or beyond!).
  • Possible outcomes of any one pitch:
    • ball
    • hit by pitch
    • swinging strike
    • called strike
    • foul ball
    • hit into play
  • The hitting team scores a run when a player advances around the bases to return to home plate.
  • The team with the most runs after 9 innings wins. (No ties, so ‘extra’ innings if necessary.)
  • The scoreboard:
Team 1 2 3 4 5 6 7 8 9 R H E
visitor 0 1 2 0 0 1 0 0 0 4 13 1
home 2 1 0 0 3 0 1 1 x 8 9 0
  • The 9 defensive positions of the fielding team:

source: whatisbaseball.com/wp-content/uploads/2019/04/Diagram-of-BB.png

Possible states

Baseball has a small number (24) of possible game states at the start of each plate appearance:
outs 3rd base 2nd base 1st base
0 0 0 0
0 0 0 1
0 0 1 0
0 0 1 1
0 1 0 0
0 1 0 1
0 1 1 0
0 1 1 1
1 0 0 0
1 0 0 1
1 0 1 0
1 0 1 1
1 1 0 0
1 1 0 1
1 1 1 0
1 1 1 1
2 0 0 0
2 0 0 1
2 0 1 0
2 0 1 1
2 1 0 0
2 1 0 1
2 1 1 0
2 1 1 1

A quick history of keeping statistics

Historically important statistics

Batting

  • Hits (H)
  • Home runs (HR)
  • Runs batted in (RBI)
  • Batting average (AVG). Batting average is defined as the ratio H/AB, where
    • H = hits, defined as the sum of singles, doubles, triples, and home runs; and
    • AB is ‘at bats’, defined as \[\text{AB = PA - (BB + HBP + SF + SH + CI)}\]

The hard to achieve hitting triple crown

Other important offensive stats, historically:

  • Runs scored (R)
  • Stolen bases (SB)
  • Slugging percentage (SLG), \[ SLG = \frac{H + 2B + 2*3B + 3*HR}{AB}\]

Pitching - Wins (W) - Strikeouts (K) - Earned Run Average (ERA), which is defined as \[ERA = \frac{ER}{IP}*9,\] where ER represents the number of earned runs allowed by the pitcher, and IP represents the number of innings pitched. - WHIP (walks + hits per inning pitched) \[WHIP = \frac{BB + H}{IP}\]

  • These statistics were (are?) of central importance in evaluating the greatness of players (MVP awards and admission to the Hall of Fame).
  • But are these statistics the best measure of player performance and, more importantly, team performance?

Advanced statistics